AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Video Caption Generation

# Video Caption Generation

Tarsier 34b
Apache-2.0
Tarsier-34b is an open-source large-scale video-language model focused on generating high-quality video captions and achieving leading results in multiple public benchmarks.
Video-to-Text Transformers
T
omni-research
103
17
Timesformer Bert Video Captioning
A video caption generation model based on Timesformer and BERT architectures, capable of generating descriptive captions for video content.
Video-to-Text Transformers
T
AlexZigma
83
3
Git Large Vatex
MIT
GIT is a Transformer decoder conditioned on CLIP image tokens and text tokens, designed for tasks like image and video caption generation and visual question answering.
Image-to-Text Transformers Supports Multiple Languages
G
microsoft
267
1
Git Base Vatex
MIT
GIT is a Transformer-based generative image-to-text model, with the base version fine-tuned on the VATEX dataset, suitable for tasks such as image and video caption generation.
Image-to-Text Transformers Supports Multiple Languages
G
microsoft
752
4
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase